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The complete nucleotide sequence of polyprotein gene 1 and the assembled full-length genome sequence 
are presented for turkey coronavirus (TCoV) isolates 540 and ATCC. The TCoV polyprotein gene encoded 
two open reading frames (ORFs), which are translated into two products, ppl a and ppl ab, the latter being 
produced via -1 frameshift translation. TCoV polyprotein ppl a and pplab were predicted to be processed 
to 15 non-structure proteins (nsp2-nspl6), with nspl missing. ClustalW analysis revealed 88.99% identity 
and 96.99% similarity for pplab between TCoV and avian infectious bronchitis virus (IBV) at the amino 
acid level. The whole genome consists of 27,749 nucleotides for 540 and 27,816 nucleotides for ATCC, 
excluding the poly( A) tail. A total of 13 ORFs were predicted for TCoV. Five subgenomic RNAs were detected 
from ATCC-infected turkey small intestines by Northern blotting. The whole genome sequence had 86.9% 
identity between TCoV and IBV, supporting that TCoV is a group 3 coronavirus. 

© 2008 Elsevier B.V. All rights reserved. 


1. Introduction 

Turkey coronavirus (TCoV) is a causative agent for bluecomb dis¬ 
ease in turkey poults. The outbreak of the disease was first reported 
more than 40 years ago, and the viral agent responsible for the dis¬ 
ease was identified as turkey coronavirus in 1973 (Ritchie et al., 
1973). TCoV infects the small intestine of turkey poults and causes 
disruption of the infected tissue resulting in reduced surface area 
of intestine, reduced consumption of food and apparent decrease in 
body weight of infected turkeys. The mortality rate is low, however. 
The outbreaks of TCoV were mostly reported from turkey farms 
in the US and Europe (Cavanagh, 2005; Nagaraja and Pomeroy, 
1997). Based on the antigenic relationship between TCoV and other 
coronaviruses, TCoV was classified with avian infectious bronchitis 
virus (IBV), which infects chicken, as a group 3 coronavirus within 
the Genus Coronavirus, Family Coronaviridae, and Order Nidovirales 
(Gonzalez et al., 2003). 

Coronavirus genome contains a single, positive-strand RNA ((+) 
ssRNA) molecule, which is about 27-33 kilobases (kb) and has a cap 
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at the 5' end and poly(A) tail at the 3' end (Boursnell et al., 1987; Lai 
and Stohlman, 1981 ). There are four structural genes encoded by all 
coronavirus genomes so far sequenced; these are spike protein (S), 
envelope protein (E), matrix protein (M), and nucleocapside protein 
(N). The genome organization of coronavirus is S'-polymerase- 
S-E-M-N-3'. An untranslated region (UTR) is located at both the 
5' and 3' ends of the genome. The production of structural pro¬ 
teins is through transcription of a set of co-terminal subgenomic 
mRNA (sgRNA). The molecular mechanisms of genome replication 
and transcription are not fully understood, but the discontinu¬ 
ous negative-strand extension model has gained wide acceptance 
(Sawicki and Sawicki, 1995; Sawicki et al., 2007). 

The polymerase gene accounts for about two-thirds of the 
genome (20-22 kDa) and consists of two open reading frames 
(ORFs): ORFla and ORFlb (Boursnell et al., 1987). The polymerase 
is necessary and sufficient for genome replication and transcrip¬ 
tion because purified viral RNA or in vitro transcribed viral RNA 
from cDNA construct are infectious when transfected into per¬ 
missive cells (Yount et al., 2000). However, nucleocapsid protein 
greatly enhanced coronavirus genome replication (Almazan et al., 
2004; Schelle et al., 2005), suggesting that nucleocapsid protein 
may have a regulatory role for coronavirus replication. When viral 
genomic RNA enters the host cell, ORFla (ppla) and polypro¬ 
tein lab (pplab) are translated first, the latter being translated 
through a -1 frameshift translation mechanism (Bredenbeek et 
al., 1990; Brierley et al., 1989; Herold and Siddell, 1993; Lee et al., 
1991). Coronavirus pplab contains a 3C-like proteinase (3CLpro) 
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and a papain-like proteinase (PLP) that automatically cleave them¬ 
selves from polyprotein and further process the polyprotein into 
more than 15-16 non-structure proteins (nsps) (Weiss et al., 1994) 
including an RNA-dependent RNA polymerase (RdRp, nspl2), an 
NTPase/Helicase (nspl3) for unwinding dsRNA, and three other 
proteins recently predicted to be a nuclease ExoN homolog (nspl4), 
an endoRNAse (nspl5), and a 2 r -0-methyltransferase (2 / -0-MT, 
nspl6) (Snijder et al., 2003). The biochemical functions of some of 
these enzymes were recently characterized (Bhardwaj et al., 2004; 
Ivanov et al., 2004a,b; Ivanov and Ziebuhr, 2004; Minskaia et al., 
2006; Snijder et al., 2003), and their in vivo roles are under investi¬ 
gation. 

Recently, our lab reported the completion of the 3' end sequence 
of four isolates of the TCoV (Lin et al., 2004; Loa et al., 2006). 
The sequence revealed four structure genes (E, M, N, and S) and 
four accessory ORFs designated as 3a, 3b, 5a, and 5b (Breslin et 
al., 1999a,b; Lin et al., 2004). Sequence analysis indicated that the 
sequences of M and N of TCoV shared over 80% sequence iden¬ 
tity with that of IBV. However, the S gene shared less than 40% 
sequence similarity to any known coronavirus S genes (Lin et al., 
2004). These results suggest that TCoV may have diverged from IBV 
during evolution. In this study, we continue to determine and ana¬ 
lyze the nucleotide sequence of the polyprotein gene of TCoV and 
use bioinformatics to predict potential functional domains encoded 
by TCoV polyprotein lab (pplab). The polymerase gene sequence 
is then combined with structure gene sequence to assemble the 
full-length genome sequence for TCoV. 

2. Materials and methods 

2.1. Viruses 

TCoV isolate ATCC (Minnesota strain) was obtained from Ameri¬ 
can Type Culture Collection (ATCC, Manasass, VA). The TCoV isolate 
540 used in the present study was recovered from fecal contents 
and intestines of turkey poults with acute coronaviral enteritis in 
Indiana, USA in 1994. The viruses were propagated in 22-day-old 
embryonating turkey eggs. The presence of TCoV in the intestines 
of embryos was confirmed by TCoV-specific immunofluorescence 
antibody assays and electron microscopy at the Indiana State Ani¬ 
mal Disease Diagnostic Laboratory in West Lafayette, IN, USA (Lin 
et al., 2004). Viruses were purified from small intestine following 
published method (Loa et al., 2002) and either used immediately 
or stored at -80 °C for further use. 

2.2. RNA isolation and cDNA synthesis 

Viral genomic RNA was purified with RNApure reagent (Gen- 
Hunter). Briefly, 0.2 ml of virus suspension was mixed with 1 ml 
of RNApure reagent followed by chloroform extraction. RNA was 
finally precipitated by isopropanol and washed with 70% ethanol. 
RNA pellet was air dried and dissolved in 30 |ixl of DEPC-H 2 0 and 
used for cDNA synthesis by Superscript RT II system with random 
hexmer or oligo dT18 (for 3' RACE) (Invitrogen). The synthesized 
cDNA was treated with RNase A to digest viral RNA and then served 
as template for PCR. 

2.3. PCR amplification 

To clone the whole lab gene, the following strategies were 
employed. The first was to amplify a 900-bp conserved RdRp 
sequence based on Stephensen’s report (Stephensen et al., 1999). 
Then, long-PCR was used to amplify the region between RdRp and 
the spike gene. Based on sequence results, bioinformatics analysis 
was used to design PCR primers to amplify the remaining sequence 


of ORF la gene. Expand LA PCR system (Roche) was used for all 
PCR amplification. The PCR reaction consisted of 1 x PCR buffer, 
1.7 mM MgCl 2 ,500 nM each of dNTPs, 200 pmol of each primer, 2 pi 
of cDNA, and 0.25 unit of DNA polymerase in a final volume of 50 pi. 
The PCR was performed on a Tetra machine (MJ Research) with the 
following conditions: initial denaturation at 94 °C for 3 min; denat- 
uration at 93 °C for 10 s, annealing at 55 °C for 30 s, extension at 
68 °C for 5-6 min; total of 30 cycles. The final extension at 68 °C 
was 10 min. PCR product was purified by Qiagen PCR purification 
Kit (Qiagen), cloned into pCRII-TOPO vector, and transformed into 
TOP10F cells (Invitrogen). The plasmid was prepared by QIAquick 
Spin Miniprep Kit (Qiagen) and submitted for DNA sequencing 
at Purdue Genomic Center (Purdue University, West Lafayette, IN, 
USA). At least two independent colonies were sequenced for each 
sequence. All PCR primers are listed in Supplementary Table SI and 
are available upon request. 

2.4. Amplification of 5' and 3' ends by RACE 

To amplify the 5' end of TCoV genome, 5' RACE system for rapid 
amplification of cDNA ends (Invitrogen) was employed except that 
Expand LA polymerase was used in the PCR. Random primers were 
used to synthesize cDNA from ATCC and 540 RNA. The cDNA was 
treated with RNase mix and purified by GlassMax spin cartridge 
according to manufacture’s protocol (Invitrogen). The 3' end of 
cDNA was tailed with dCTP by TdT. After tailing with dCTP, PCR 
was performed with primers AAP (GGC CAC GCG TCG ACT AGT ACG 
GGI IGG GII GGG IIG) and IBPR2 (TGG CAC TAC CCC CTA CAA AC). 
The amplified PCR product was analyzed and cloned for sequencing 
in the same way as described in previous section for PCR amplifi¬ 
cation. 

To amplify the 3' end of TCoV genome, oligodT18 was used to 
synthesize cDNA from genomic RNA of ATCC and 540. After degra¬ 
dation of RNA with RNase mix, the cDNA was used as template 
for PCR with primers oligodT15 and AT3endF (TGGAATTTGATGAT- 
GAACC, 96 nt upstream of the stop codon of ATCC N gene). The PCR 
product was treated in the same way as above. 

To obtain the leader-body junction sequence of each subge- 
nomic mRNA (sgRNA), primers TCVF (ACTAAAGATAGATATT- 
AATATATATCTATTGCACTAGCC) and TCVsgRl (AAACCAAGATG- 
CATTTCC) were used to amplify the 5' end of sgRNA for 3, M, 5, and N. 
For amplify the 5' end of sgRNA for S gene, TCVF and ATS174 (TCTG- 
GCGGTCTCATAACATCTGGA) were used in PCR. PCR products were 
cloned into pCRII-TOPO for sequencing as described in previous 
section. 

2.5. Sequence analysis 

All DNA sequences were analyzed by DNAStar software (Madi¬ 
son, WI, USA) and ClustalW program (Thompson et al., 1994) or 
online softwares as indicated in the results. Frameshift pseudoknot 
was predicted using M-fold (Mathews et al., 1999). 

2.6. Polyprotein mapping 

Polyprotein mapping of TCoV lab polyprotein was based on 
predicted 3CLpro and PLP and their substrate preferences as 
described for IBV (Liu et al., 1998) and other coronaviruses 
(Hegyi and Ziebuhr, 2002; Kiemer et al., 2004). BLASTp program 
(NCBI: http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) and pfman 
(www.expasy.org) were used to find sequence similarity and 
conserved domains in database. TMHMM was used to predict trans¬ 
membrane domains (http://www.cbs.dtu.dk/services/TMHMM- 
2.0/). The nomenclature for ppla and pplab mapping product (nsp) 
was according to Ziebuhr, 2005 and Ziebuhr et al., 2000. 
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Table 1 

Open reading frames encoded in TCoV-ATCC genome 



2.7. Phylogenetic analysis 

The alignments were performed using CLUSTALW (Thompson 
et al., 1994), and phylogenetic trees were drawn by DNAStar and 
program at http://www.genebee.msu.su/services/phtree_full.html. 

Coronavirus sequences used in this article were from NCBI. Their 
GenBank accession numbers were: 

BCoV, NC_003045; BtCoV, NC_008315; FCoV, NC_007025; 
HCoV-229E, NC_002645; HCoV-NL63, NC_005831; HCoV-OC43, 


NC_005147; HCoV-HKUl, NC_006577; IBV, NC_001451; MHV, 
NC.001846; PEDV, NC.003436; SARS-CoV, NC.004718; TGEV, 
NC.002306. 

2.8. Northern blotting 

About 10 \xg of isolated total RNA from mock and ATCC-infected 
turkey small intestines were separated on 1% agarose gel and 
transferred onto nitrocellulose membrane. 32 P-CTP-(GE health¬ 
care) labeled N gene probe was prepared using High Prime DNA 
Labeling Kit (Roche) with N gene primers N102F and N102R. Mem¬ 
brane was prehybridized for 2h at 68 °C and then hybridized 
overnight at 68 °C with 32 P-labeled N gene probe. After hybridiza¬ 
tion, membranes were wrapped with Saran Wrap and exposed to 
X-ray film for signal development. 

3. Results and discussion 

3.1. Nucleotide sequence accession number 

The sequences reported in this work have been deposited in the 
GenBank database under accession number EU022526 for TCoV- 
ATCC and EU022525 forTCoV-540. 


(a) Genome organization of TCoV 
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Fig. 1 . (a) Genome organization of TCoV. Diagram shows putative ORFs. UTRs, leader (L), and TRS are not to scale. Above, the genome organization of TCoV are shown 
the predicted five sgRNA in relative sizes. Genome organization of IBV-Beaudette (NC_001451) is displayed below for comparison with that of TCoV. (b) Mapping of TCoV 
polyprotein. Predicted non-structure proteins (nsp2-nspl6) for ppla are shown in relative sizes (bottom panel). Nspl is missing from TCoV and nspll contains only 23 aa. 
The sequence is for ATCC isolate (accession number EU022526). 
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Table 2 

Open reading frames encoded in TCoV-540 genome 



3.2. Polyprotein gene of TCoV 

The sequence of polymerase gene 1 of TCoV isolate ATCC con¬ 
tained 20,441 -nts, excluding the 5' UTR. Two ORFs were encoded by 
gene 1. ORFla contained 11,874 nts (529-12,402) encoding a pro¬ 
tein of 3957 aa (ppla); ORFlb contained 7965 nt (12,477-20,441) 
encoding a protein of 2654 aa (pplb) (Table 1). The polyprotein 
gene of the TCoV 540 isolate consisted of 20,401 nts excluding the 
5' UTR. ORF la was 11,838 nts (531-12,368), encoding ppla of 3945 
aa, and ORF lb was 7959nts (12,443-20,401), encoding a protein 
of 2652 aa (Table 2). Through -1 frameshift translation, pplab was 
predicted to contain 6637 aa for ATCC and 6623 aa for 540. The 
3' end of ORFlb overlapped with the 50 nts on the 5' end of spike 
gene. There were 14 aa missing in 540 pplab when compared with 
ATCC pplab. They were distributed at 7 positions on pplab of the 


ATCC isolate, i.e. positions 922-923 (2 aa, nsp3); 930 (1 aa, nsp3); 
971-973 (3 aa, nsp3); 2306-2307 (2 aa, nsp4); 3226-3229 (4 aa, 
nsp6); 4234 (1 aa, nspl2); 5095 (1 aa, nspl3). ClustalW compari¬ 
son of the protein sequence between 540 and ATCC showed that the 
sequence identities of ppla, pplb, and pplab were 89.92%, 95.86%, 
and 92.26%, respectively. The overall similarities for ppla, pplb, 
and pplab were 97.4%, 98.91%, and 97.97%, respectively. 

The frameshift “slippery sequence” UUUAAAC (Brierley et al., 
1989) was identified for both ATCC and 540. Both sequences were 
located before the end of ORFla. The sequences downstream of 
UUUAAAC were predicted to form a pseudoknot to support the 
translational frameshift (Brierley et al., 1989) (Supplementary Fig. 
SI ). The frameshift position was predicted at C of UUUAAAC. 

Comparison of ppl a and pplab of TCoV with those of other coro- 
naviruese revealed that the TCoV polyprotein was predicted to be 
processed into 15 non-structure proteins (nsp2-nspl6; Fig. 1(b) 
and Table 3) by polyprotein-encoded viral proteinases. One 3C- 
like proteinase (3CLpro) was predicted to reside in nsp5 due to its 
conserved residues responsible for3CLpro activity (Supplementary 
Fig. S2) (Ziebuhr et al., 2000); one papain-like proteinase (PLpro) 
was identified in nsp3 due to its conserved PLP residues (CHD) 
(Supplementary Fig. S3 ). Like another group 3 coronavirus, IBV, only 
one active PLpro was predicted for TCoV. The structure of TCoV nsp3 
bears similar organization to nsp3 of IBV in that the Ac domain, X 
domain (ADPR), and Y domain were all present and arranged in 
the same order (Fig. 1(b)). Comparison of amino acid sequences 
of each nsp of TCoV with those of other coronaviruses predicted 
several putative enzymatic activities; among them, the enzymatic 
activity and potentials of nsp2, nsp5 (Supplementary Fig. S3), nspl3 
(Supplementary Fig. S5), nspl4 (Supplementary Fig. S6), and nspl5 
(Supplementary Fig. S7) were confirmed in other coronaviruses by 


Table 3 

Polyprotein mapping for TCoV ATCC 


Cleavage products 

Polyprotein 

Position on polyprotein 

Size (aa) 

Cleavage by 

Potential function 

nsp2 

Pplab/ppla 

M1-G673 

673 

PLP 


nsp3 

Pplab/ppla 

G674-G2267 

1,594 

PLP 

TM1, PLpro, ADRP 

nsp4 

Pplab/ppla 

G2268-Q2781 

514 

PLP/3CLpro 

TM2 

nsp5 

Pplab/ppla 

A2782-Q3088 

307 

3CLpro 

3CLpro 

nsp6 

Pplab/ppla 

S3089-Q3385 

297 

3CLpro 

TM3 

nsp7 

Pplab/ppla 

S3386-Q3468 

83 

3CLpro 


nsp8 

Pplab/ppla 

S3469-Q3678 

210 

3CLpro 

dsRNA binding; RdRp? 

nsp9 

Pplab/ppla 

N3679-Q3789 

111 

3CLpro 

TM4; ssRNA binding 

nsplO 

Ppla 

S3790-Q3934 

145 

3CLpro 

Zinc-binding; ssRNA binding 

nspll 

Pplab 

S3935-G3957 

23 

3CLpro 


nspl2 

Pplab 

S3935-Q4875 

941 

3CLpro 

RdRp 

nspl3 

Pplab 

S4876-Q5476 

601 

3CLpro 

Helicase 

nspl4 

Pplab 

G5477-Q5997 

521 

3CLpro 

Exoribonuclease 

nspl5 

Pplab 

S5998-Q6335 

338 

3CLpro 

NendoU 

nsp!6 

Pplab 

S6336-M6637 

302 

3CLpro 

2 / -0-Methyltransferase 


Table 4 

Sequence identity of TCoV-ATCC nsps with other coronaviruses 
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experimentation (Bhardwaj et al., 2004; Eckerle et al., 2007; Fang 
et al., 2006; Graham et al., 2005; Kiemer et al., 2004). Nsp8, nsp9, 
and nsplO were predicted to have RNA binding activity (Egloff et al., 
2004; Matthes et al., 2006; Zhai et al., 2005). Nspl2 was predicted 
to be the major RdRp (Supplementary Fig. S4), though its activity 
has not been experimentally confirmed. Nspl6 was predicted to be 
2 r -0-methyltransferase (Supplementary Fig. S8). 

3.3. Genome organization of TCoV 

The first two full-length genome sequences were reported 
for TCoV prototype ATCC and field isolate 540. The complete 
genome sequences were obtained by assembly of polyprotein gene 
sequences that were determined by direct sequencing of cloned RT- 
PCR products in this report and published structure gene sequences 
of the same isolates from our lab (Lin et al., 2004; Loa et al., 2006). 
Both 5' and 3' UTR sequences were determined by RACE and used to 
assembly the full-length genomic sequence. The reported genomic 
sequences were 27,817 nucleotides (nt) for ATCC and 27,749 nt for 
540, excluding poly(A) tail. For both TCoV isolates, the percentage 
of nucleotide composition was 29% for A, 33% for U, 22% for G, and 
16% for C. A + U was 62%, indicating that the genome of TCoV was 
AU rich. The genome nucleotide sequence identity between 540 
and ATCC was 92.8% by Clastal W. 540 and ATCC shared nucleotide 
sequence identity of 86.9% and 87.5% with that of IBV, respectively. 

Analysis of genome organization of the TCoV-ATCC isolate 
revealed that there was a 64-nt (1-64) leader sequence within 
the 5' UTR of 530 nt. As found in other coronaviruses (Brian and 
Baric, 2005), the 5' UTR of TCoV encoded an ORF of 11 amino acids 
(Supplementary Table S2). Using the ORF finder at NCBI, it was 
revealed that there were 13 putative ORFs in the genomes of TCoV 
isolates ATCC and 540. These ORFs were la, lb, 2 (spike), 3a, 3b, 
3c (envelope), 4a (matrix), 4b, 4c, 5a, 5b, 6a (nucleocapsid), and 6b 
(Fig. 1(a); Tables 1 and 2). 4b came immediately after the matrix 
gene. 6b was immediately following N gene. By comparison with 
another group 3 coronavirus, IBV-Beaudette, it was found out that 
4c and 6b were not present in IBV-Beaudette (Fig. 1(a)). The predic¬ 
tion of 6b was not expected. After N gene, the nucleotide sequences 
of TCoV and IBV-Beaudette were highly conserved (Supplementary 
Fig. S9). However, there was no ORF in this region of IBV, so the 
3' UTR of IBV was over 500 nt. In both isolates of TCoV, a 74-aa 


M ATCC 

H| 


Fig. 3. Northern blotting of total RNA isolated from TCoV infected turkey small 
intestines. Total RNA (10 jxg) was isolated from mock or ATCC infected turkey small 
intestines 3 days post infection, separated on 1% agarose gel, transferred onto nitro¬ 
cellulose membrane, and detected with 32P labeled PCR probe corresponding to 
N gene. The sizes (kb) on the right indicate the predicted size of genomic and 
subgenomic RNA. X indicates assumed DI RNA. 

ORF (6b) was predicted in this region irrespective of nucleotide 
sequence conservation between TCoV and IBV. The prediction of 
ORF 6b reduced the potential 3' UTR of TCoV to less than 301-nts 
as compared with 506-nts in IBV. Determination of whether or not 
proteins of 4b, 4c, and 6b were produced requires further experi¬ 
mental confirmation. A consensus octanucleotide motif GGAAGAGC 
was found 72-nt upstream of the poly(A) tail in 540 and ATCC 
genomes of the TCoV. In mouse hepatitis virus, the octanucleotide 
motif was found to be unnecessary for virus replication in vitro, but 
a deletion mutant showed reduced replication in mouse brain, sug¬ 
gesting that the octanucleotide motif affects pathogenesis (Goebel 
et al., 2007). A consensus transcriptional regulated sequence (TRS) 
(CUUAACAAA) was found located at the 3' end of the genome leader 
(l-64nt) and in front of each structure gene and major accessory 
gene with either an exact match (sg3-6) or one mismatch (sg2). A 
total of five sgRNA were predicted for production of structure and 
accessory proteins in the genome of TCoV. 


MHV-A59 



HCoV-N L63 


3.4. Phylogenetic analysis of l ah 

Clastal W program was used to analyze the relationship 
between TCoV pplab and other coronavirus pplab. Table 4 
was a summary of the amino acid sequence identity of nsps 
between TCoV ATCC and other coronaviruses. It was noticed 
that TCoV and IBV shared highest sequence identity for all nsps 
when compared with other coronaviruses. Tree-top software 
was used to draw phylogenetic trees (http://www.genebee.msu. 
su/services/phtree_full.html). Fig. 2 shows the result of phyloge¬ 
netic analysis of pplab. The TCoV was grouped with the IBV in the 
group 3. A close examination of TCoV and IBV polyprotein pplab 
showed that the matrix distance within the two TCoV strains 
was longer (0.047) than that of TCoV and IBV (0.045). ClustalW 
analysis of pplab, 3CLpro, RdRp, and helicase of TCoV and IBV 
showed sequence similarity of 97.97%, 94.09%, 98.5%, and 97.66%, 
respectively. 

3.5. Subgenomic mRNA detection for TCoV 


Fig. 2. Phylogenetic relationship between TCoV and other coronaviruses 
pplab. The map was generated by Phylip at http://www.genebee.msu.su/ 
services/phtree_full.html. Sequences of pplab of coronaviruses were used for 
analysis. The sequences accession numbers are listed at the end of the article. 


Based on the location of TRS on the genome, it was predicted 
that 5 subgenomic mRNA would be produced for structure and 
accessory gene translation (Fig. 1 ). 
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Fig. 4. Sequences flanking TRS region for TCoV sgRNA. The partial sequences display each sgRNA for ATCC isolate. For each sgRNA, partial genomic leader (gL) and body (gS, 
gE, gM, g5, and gN) sequences are displayed above and below sgRNA. The star (*) indicates identical nucleotide and the box indicates TRS region where template switch is 
assumed to occur. 


To confirm predicted sgRNA production for TCoV, total RNA was 
isolated from mock or ATCC-infected turkey small intestines and 
used for Northern blotting with 32 P-labeled PCR probe specific 
for the N gene. Fig. 3 shows 7 RNA bands detected in the ATCC 
infected sample, but not in the mock-infected sample, indicating 
the specificity of the probe. Based on predicted sizes for genomic 
and subgenomic RNA, one band was assigned to genomic RNA and 
five bands were assigned to sgRNA 2-6 for expression of S, E, M, 
5, and N proteins. One extra band whose size was smaller than 
genomic RNA was assumed to be a defect interfering (DI) RNA. DI 
RNA has been detected in other coronavirus-infected cells and was 
assumed to be the template switch products during replication. 

Because TRS in sgRNA could be derived from template switch 
between leader and body TRS, we aimed to determine potential 
switch position by analyzing sequences flanking the TRS region in 
each sgRNA. Fig. 4 is a summary of partial sequences flanking the 
TRS region for each sgRNA. It was noticed that the TRS (CUUAA- 
CAAA) of the S gene sgRNA was identical to the TRS of the leader, 
but different from the body TRS by one nucleotide (CUgAACAAA). 


This suggested that the template switch was downstream of CUU 
on the leader TRS. The TRS of the remaining sgRNA was the same as 
for the leader and the body TRS, implying the template switch could 
have occurred anywhere within CUUAACAAA. As expected, genes 
3a, 3b, and 3c (E) shared the same sgRNA for translation; genes 4a 
(M), 4b, and 4c shared the same sgRNA; genes 5a and 5b shared the 
same sgRNA; genes 6a (N) and 6b shared the same sgRNA. Deter¬ 
mination of weather or not the predicted 3a, 3b, 4b, 4c, 5b, and 6b 
were expressed require experimental confirmation and hence their 
biological functions during replication and pathogenesis. 

4. Conclusion 

In conclusion, our data of completed TCoV polyprotein gene 
sequence and the assembly of the first full-length genome of TCoV 
support the classification of TCoV as a group 3 coronavirus. The 
completed genome sequences of two TCoV isolates will aid our 
understanding of coronavirus in terms of molecular evolution and 
molecular pathogenesis. It will also provide a strong basis for the 
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development of up-dated molecular diagnostics and recombinant 
or DNA-based vaccines for the control and prevention of TCoV infec¬ 
tion in turkey flocks. 

Appendix A. Supplementary data 

Supplementary data associated with this article can be found, 
in the online version, at doi:10.1016/j.virusres.2008.04.015. 
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